All the results were filtered by the adj.p-value < 0.05 and logFC >= 0.3
## An object of class Seurat
## 48361 features across 12617 samples within 1 assay
## Active assay: RNA (48361 features, 0 variable features)
For single cell methods I am going to consider both Seurat and Scanpy
For Scanpy I take in account the methods: - t-test_overestim_var - wilcoxon
Unfortunately the logistic regression provided by scanpy does not give LogFoldChanges and will not be taken in account.
## INFO [2023-01-13 11:51:32] $x
## INFO [2023-01-13 11:51:32] list(scanpy_t$Symbol, scanpy_wx$Symbol)
## INFO [2023-01-13 11:51:32]
## INFO [2023-01-13 11:51:32] $category.names
## INFO [2023-01-13 11:51:32] c("scanpy_t", "scanpy_wx")
## INFO [2023-01-13 11:51:32]
## INFO [2023-01-13 11:51:32] $filename
## INFO [2023-01-13 11:51:32] NULL
## INFO [2023-01-13 11:51:32]
## INFO [2023-01-13 11:51:32] $disable.logging
## INFO [2023-01-13 11:51:32] T
## INFO [2023-01-13 11:51:32]
The t-test_overestim_var recognized as significative almost all the genes and include almost all the genes from the wilcoxon test. Nolw let’s evaluate the expression of the genes.
Top upregulated genes in Exhausted NK cells:
Top downregulated genes in Exhausted NK cells:
Genes with lower abs(logFC)
Apparently Scanpy has a bias evaluating the logFC of
genes that are completely missing in one of the conditions.
But The genes are correctly plotted in the report. So I will check the expression
by the z-score instead of the LogFC:
## [1] "Top upregulated genes"
## [1] "Top downregulated genes"
## [1] "lowest abs(LFC) genes"
Better result for the extremes logFC (scores), but still not clear results for the low scored genes. Indeed those genes are expressed by a small proportion of cells:
## Symbol pvals_adj logfoldchanges abs_lfc scores NK.exhausted
## 533 RDH5 0.04959774 0.5395693 0.5395693 3.083510 0.029618321
## 534 CCDC102A 0.04983460 0.3513908 0.3513908 3.081501 0.063053435
## 535 AC016405.3 0.04970821 -1.7653115 1.7653115 -3.082647 0.001221374
## NK.resident
## 533 0.019943959
## 534 0.049447833
## 535 0.004450305
## [1] "Upregulated GO terms"
## [1] "Downregulated GO terms"
GO terms can be linked to immune cell activity, down terms (up for resident NK cells) are correlated to immune activation.
## [1] "Top upregulated genes"
## [1] "Top downregulated genes"
## [1] "lowest abs(LFC) genes"
## [1] "Upregulated GO terms"
## [1] "Downregulated GO terms"
Better differentially expressed genes. GO looks quite similar.
For Seurat I take in account the methods: - t-test - wilcoxon - logistic regression
Similar numbers of DE genes were found by Seurat’s different flavours.
## [1] "Top upregulated genes"
## [1] "Top downregulated genes"
## [1] "lowest abs(LFC) genes"
## [1] "Upregulated GO terms"
## [1] "Downregulated GO terms"
Better result with seurat, the differences of expression are small (same cell types, just different state). In average the different gene expression is better appreciable
Enriched GO terms from upregulated genes are specific of immune activation. Downregulated GO terms are inherent to response to various factor, could it be a hint of the missing response to the cancer cell in exhausted NK cell?
## [1] "Top upregulated genes"
## [1] "Top downregulated genes"
## [1] "lowest abs(LFC) genes"
## [1] "Upregulated GO terms"
## [1] "Downregulated GO terms"
Similar to the previous
## [1] "Top upregulated genes"
## [1] "Top downregulated genes"
## [1] "lowest abs(LFC) genes"
As above.
In the pseudobulks, counts were summed for each celltype and sample, in practice obtaining a column of counts for each celltype and sample. Then, genes that weren’t showing counts in more than 90% of the cells were removed. Also, genes showing expression below 10 counts in all the pseudobulks were removed (pseudobulks are the sum of cells counts. showing less than 10 counts indicate almost all the cells didn’t had count’s for that gene).
## [1] 12421 145
For DESeq2, firstly I run it with the “LRT” methods that showed better results here. The PCA plot looks good but the clustering mix some samples here. I run DESeq2 specifing the model to take account of the batches.
Then, I tested combat_seq to correct the batch effect here. Also, I decided to
see if there was a batch effect afrom unknown sources using the single
variable analysis (sva) here.
## INFO [2023-01-13 11:51:35] $x
## INFO [2023-01-13 11:51:35] list(deseq2$Symbol, deseq2_combat$Symbol, deseq2_sva$Symbol)
## INFO [2023-01-13 11:51:35]
## INFO [2023-01-13 11:51:35] $category.names
## INFO [2023-01-13 11:51:35] c("deseq2", "deseq2_combat", "deseq2_sva")
## INFO [2023-01-13 11:51:35]
## INFO [2023-01-13 11:51:35] $filename
## INFO [2023-01-13 11:51:35] NULL
## INFO [2023-01-13 11:51:35]
## INFO [2023-01-13 11:51:35] $disable.logging
## INFO [2023-01-13 11:51:35] T
## INFO [2023-01-13 11:51:35]
DESeq2 without batch correction found alomst 500 DE genes,
sva that corrected by batch effect without prion knowledge
found 411. Combat correction found only 37 genes. Around half of the
genes found by DESeq2 and DESeq2+sva are shared.
## [1] "Top upregulated genes"
## [1] "Top downregulated genes"
## [1] "lowest abs(LFC) genes"
## [1] "Upregulated GO terms"
## [1] "Downregulated GO terms"
Genes LogFC are slightly appreciable for the expreme values. Genes with low abs(LogFC) are not visible. The Go terms are quite similar to those found with seurat.
## [1] "Top upregulated genes"
## [1] "Top downregulated genes"
## [1] "lowest abs(LFC) genes"
## [1] "Upregulated GO terms"
## [1] "Downregulated GO terms"
combat correction better better identified genes with low abs(LogFC) but didn’t identified clearly the genes with low LogFC. The Go terms looks less specific for cell response (down).
## [1] "Top upregulated genes"
## [1] "Top downregulated genes"
## [1] "lowest abs(LFC) genes"
## [1] "Upregulated GO terms"
## [1] "Downregulated GO terms"
Similarly to DESeq2, the genes with abs(LogFC) are not well apreciable. The Down regulated GO terms look more specific to activation of immune cells.
Normal and with combat
## INFO [2023-01-13 11:51:35] $x
## INFO [2023-01-13 11:51:35] list(limma$Symbol, limma_combat$Symbol)
## INFO [2023-01-13 11:51:35]
## INFO [2023-01-13 11:51:35] $category.names
## INFO [2023-01-13 11:51:35] c("limma", "limma_combat")
## INFO [2023-01-13 11:51:35]
## INFO [2023-01-13 11:51:35] $filename
## INFO [2023-01-13 11:51:35] NULL
## INFO [2023-01-13 11:51:35]
## INFO [2023-01-13 11:51:35] $disable.logging
## INFO [2023-01-13 11:51:35] T
## INFO [2023-01-13 11:51:35]
Limma found more than 34 genes, no differences with the batch corrected data
## [1] "Top upregulated genes"
## [1] "Top downregulated genes"
## [1] "lowest abs(LFC) genes"
Low LogFC aren’t well apreciable
## [1] "Upregulated GO terms"
## [1] "Downregulated GO terms"
GO terms similar to above
I tested edgeR with LRT as best option from here, I tested combat too. edgeR edgeR_combat
## INFO [2023-01-12 16:42:35] $x
## INFO [2023-01-12 16:42:35] list(edger$Symbol, edger_combat$Symbol)
## INFO [2023-01-12 16:42:35]
## INFO [2023-01-12 16:42:35] $category.names
## INFO [2023-01-12 16:42:35] c("edgeR", "edgeR_combat")
## INFO [2023-01-12 16:42:35]
## INFO [2023-01-12 16:42:35] $filename
## INFO [2023-01-12 16:42:35] NULL
## INFO [2023-01-12 16:42:35]
## INFO [2023-01-12 16:42:35] $disable.logging
## INFO [2023-01-12 16:42:35] T
## INFO [2023-01-12 16:42:35]
Both the methods found more than 9k genes with some hundreds uniquel per each method.
## [1] "Top upregulated genes"
## [1] "Top downregulated genes"
## [1] "lowest abs(LFC) genes"
## [1] "Upregulated GO terms"
## [1] "Downregulated GO terms"
High and low LogFC aren’t clear. Low abs(logFC) are better appreciable. BP terms looks similar to other methods
## [1] "Top upregulated genes"
## [1] "Top downregulated genes"
## [1] "lowest abs(LFC) genes"
## [1] "Upregulated GO terms"
## [1] "Downregulated GO terms"
DE genes similar to EdgeR. BP as above
Tested a non-parametric test with a handwritten wilcoxon test (inspired from this)
## [1] "Top upregulated genes"
## [1] "Top downregulated genes"
## [1] "lowest abs(LFC) genes"
## [1] "Upregulated GO terms"
## [1] "Downregulated GO terms"